Home > Support > HOWTO List > Java: Working with different character encodings

Writing code that works with different character sets can be difficult.  This article lists a few of the factors you will need to consider to internationalize your application.

Use this JSP file to experiment with working with different character sets. It should permit you to do an end-to-end validation of your setup (from displaying HTML, to POST-ing data, storing it it a database and retrieving it.)

Things you can test:

If your application is not behaving the way you think it should, try creating the simplest possible test case (e.g. something like this page), then adding in bits of your application until you can reproduce the problem.

Some i18n factors to consider:

useUnicode=true&characterEncoding=UTF-8

* MySQL table default charset. e.g.

CREATE TABLE t (c CHAR(20) CHARACTER SET utf8 COLLATE utf8_bin)
String convertedValue = new String(testValue.getBytes("ISO8859_1"), "UTF8")

* The Java file.encoding property. e.g. -Dfile.encoding=UTF8 on the command line.

Notes:

* Try including these params on your MySQL JDBC URL:

useUnicode=true&characterEncoding=UTF-8

The Code

<%@page language="java"
pageEncoding="utf8"
%>
<%@ page import="java.sql.*"%>
<%

// 
// contentType="text/html; charset=UTF8"
//response.setCharacterEncoding("UTF8"); 
//request.setCharacterEncoding("UTF8");
%>
<html>
<body>
<h1>The i18n Sandbox</h1>
    <p>Use this JSP file to experiment with working with different character sets.  It should permit you to do 
    an end-to-end validation of your setup (from displaying HTML, to POST-ing data, storing it it a database 
    and retrieving it.)</p>
    <p>Things you can test:</p>
    <ul>
    <li>That 'foreign' chars display correctly.</li>
    <li>That form posts of 'foreign' chars are received correctly (note this servlet turns the posted value
    to ISO-8859 bytes before converting those bytes to a UTF-8 string).</li>
    <li>That 'foreign' chars can be stored in a database.</li>
    <li>That 'foreign' chars can be retrieved from the database.</li>
    </ul>
    
    <p>If your applicaiton is not behaving the way you think it should, try creating the simplest
    possible test case (e.g. something like this page), then adding in bits of your application
    until you can reproduce the problem.</p>
    
    <p>Some i18n factors to consider:</p>
    <ul>
    <li>The jsp 'page' tag.  e.g. <%="<"+"%@page language=\"java\" pageEncoding=\"utf8\" contentType=\"text/html; charset=UTF8\"%" + ">"%></li>
    <li>Request character encoding.  e.g. request.setCharacterEncoding("UTF8")</li>
    <li>Response character encoding.  e.g. response.setCharacterEncoding("UTF8")</li>
    <li>JDBC URL.  e.g. appending useUnicode=true&characterEncoding=UTF-8</li>
    <li>MySQL table default charset.  e.g. CREATE TABLE t (c CHAR(20) CHARACTER SET utf8 COLLATE utf8_bin)</li>
    <li>The Tomcat Connector <a href="http://jakarta.apache.org/tomcat/tomcat-5.5-doc/config/http.html">URIEncoding value</a> (defaults to 8859).</li>
    <li>Converting received bytes from one charset (typically 8859) to another.  e.g. String convertedValue = new String(testValue.getBytes("ISO8859_1"), "UTF8")</li>
    <li>The Java file.encoding property.  e.g. -Dfile.encoding=UTF8 on the command line.</li>
    </ul>

    <p>Notes:</p>
    <ul>
    <li>Try including these params on your MySQL JDBC URL: useUnicode=true&characterEncoding=UTF-8</li>
    <li>See this <a href="http://weblogs.java.net/blog/joconner/archive/2005/07/charset_traps.html">blog entry</a> for info about why
    non ISO-8859 chars are not 'POST-ed' correctly.</li>
    <li>Looking for some random 'foreign' characters?  How about ???????????? (should show up as chinese Chars) or G??ttingen (should have a couple of dots over the 'o')?</li>
    </ul>
    
<h1>Loading....</h1>
<%
Connection c = null;
java.sql.PreparedStatement s = null;
ResultSet rs = null;
String testValue = request.getParameter("testvalue");
String url = request.getParameter("url");
String driver = request.getParameter("driver");
if(testValue==null) {
    testValue = "";
}
if (url == null)
    url = "jdbc:mysql://127.0.0.1/somedbname?user=root&password=somepass&useUnicode=true&characterEncoding=UTF-8";
if (driver == null)
    driver = "org.gjt.mm.mysql.Driver";
try { 
    do {
        if (driver == null)
            break;
        Class.forName(driver);
        String convertedValue = new String(testValue.getBytes("ISO8859_1"), "UTF8");
        %>
        <p>Request encoding = <%=request.getCharacterEncoding() %>, Response Encoding = <%=response.getCharacterEncoding() %></p>
        <p>Test Value parameter is '<%=testValue%>' (i.e. what the POST-ed value
        looks like to the servlet)</p>
        <p>Test Value converted from 8859 bytes to utf8 is: <%=convertedValue %></p>
        <p>Creating connection to <%=url%></p>
        <%
        c = DriverManager.getConnection(url);
        if ("POST".equalsIgnoreCase(request.getMethod())) {
            if ("ON".equalsIgnoreCase(request.getParameter("createtable"))) {%>
                <p>Creating the test table</p>
                <%
                s = c.prepareStatement("create table utf8test(keyvalue integer, testvalue varchar(100), primary key (keyvalue));");
                s.execute();
                s.close();
            }
            %>
            <p>Setting database value to '<%=convertedValue%>'</p>
            <%
            s = c.prepareStatement("replace into utf8test(keyvalue, testvalue) values(0, ?)");
            s.setString(1, convertedValue);
            s.execute();
            s.close();
        }
        s = c.prepareStatement("select * from utf8test");
        rs = s.executeQuery();
        while (rs.next()) {
            testValue = rs.getString("testvalue");
            %>
            <p>Loaded value from database: '<%=testValue%>' (i.e. what the value
            looks like from the DB)</p>
            <%
            break;
        }
        rs.close();
        if ("ON".equalsIgnoreCase(request.getParameter("droptable"))) {%>
            <p>Dropping the test table</p>
            <%
            s = c.prepareStatement("drop table utf8test;");
            s.execute();
            s.close();
        }
    } while (false);
} catch (Throwable t) {
    java.io.StringWriter sw = new java.io.StringWriter();
    java.io.PrintWriter pw = new java.io.PrintWriter(sw);
    t.printStackTrace(pw);
    %>
    <p>Failed: <%=t.getMessage()%>:</p>
    <pre>
    <%=sw.toString()%>
    </pre>
<%} finally {
    if (rs != null) {
        rs.close();
    }
    if (s != null) {
        s.close();
    }
    if (c != null) {
        c.close();
    }
}
%>
<h2>Value to Save To DB</h2>
<form action='utf8test.jsp' method='post'>
<p><input name="createtable" type="checkbox" /> Create test table?<br />
<input name="droptable" type="checkbox" /> Drop test table?<br />
JDBC URL: <input size="80" name="url" value="<%=url %>" /><br />
Driver Name: <input size="80" name="driver" value="<%=driver%>" /><br />
Value to Store In DB: <input size="80" name="testvalue"    value='<%= testValue==null ? "" : testValue%>' /> <br/>
    <input
    type="submit" value="Update DB" /></p>
</form>
<p><a
    href="utf8test.jsp?url=<%=java.net.URLEncoder.encode(url) %>&amp;driver=<%=java.net.URLEncoder.encode(driver) %>">Reload</a></p>
</body>
</html>